StackPDB: Predicting DNA-binding proteins based on XGB-RFE feature optimization and stacked ensemble classifier
نویسندگان
چکیده
DNA-binding proteins (DBPs) not only play an important role in all aspects of genetic activities such as DNA replication, recombination, repair, and modification but also are used key components antibiotics, steroids, anticancer drugs the field drug discovery. Identifying DBPs becomes one most challenging problems domain proteomics research. Considering high-priced inefficient experimental method, constructing a detailed prediction model urgent problem for researchers. In this paper, we propose stacked ensemble classifier based method predicting called StackPDB. Firstly, pseudo amino acid composition (PseAAC), pseudo-position-specific scoring matrix (PsePSSM), position-specific matrix-transition probability (PSSM-TPC), evolutionary distance transformation (EDT), residue probing (RPT) applied to extract protein sequence features. Secondly, extreme gradient boosting-recursive feature elimination (XGB-RFE) is employed gain excellent subset. Finally, best features composed XGBoost, LightGBM, SVM construct After applying leave-one-out cross-validation (LOOCV), StackPDB obtains high ACC MCC on PDB1075, 93.44% 0.8687, respectively. Besides, independent test datasets PDB186 PDB180 84.41% 90.00%, The 0.6882 0.7997, results training dataset show that has great predictive ability predict DBPs.
منابع مشابه
SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatolo...
متن کاملMLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملClassifier Ensemble Framework: a Diversity Based Approach
Pattern recognition systems are widely used in a host of different fields. Due to some reasons such as lack of knowledge about a method based on which the best classifier is detected for any arbitrary problem, and thanks to significant improvement in accuracy, researchers turn to ensemble methods in almost every task of pattern recognition. Classification as a major task in pattern recognition,...
متن کاملDNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues
DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to pr...
متن کاملKernel-based machine learning protocol for predicting DNA-binding proteins
DNA-binding proteins (DNA-BPs) play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Attempts have been made to identify DNA-BPs based on their sequence and structural information with moderate accuracy. Here we develop a machine learning protocol for the prediction of DNA-BPs where the classifier is Support Vector Machines ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied Soft Computing
سال: 2021
ISSN: ['1568-4946', '1872-9681']
DOI: https://doi.org/10.1016/j.asoc.2020.106921